Hierarchical Categorization of Open Source Software by Online Profiles

نویسندگان

  • Tao Wang
  • Huaimin Wang
  • Gang Yin
  • Cheng Yang
  • Xiang Li
  • Peng Zou
چکیده

The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, but were verified on only relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software development requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles. We design a SVMbased categorization framework and adopt a weighted combination strategy to aggregate different types of profile attributes from multiple repositories. Different basic classification algorithms and feature selection techniques are employed and compared. Extensive experiments are carried out on more than 21,000 projects across five repositories. The results show that our approach achieves significant improvements by using weighted combination. Compared to the previous work, our approach presents competitive results with more finer-grained and multi-layered category hierarchy with more than 120 categories. Unlike approaches that use source code or byte code, our approach is more effective for large-scale and languageindependent software categorization. In addition, experiments suggest that hierarchical categorization combined with general keyword-based searching improves the retrieval efficiency and accuracy. key words: open source software, software profile, hierarchical categorization, software retrieval

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Management in Railway Industry: A Conceptual Model Based on Open Innovation and online Communities

Organizations need to be capable of attracting external knowledge. This activity is extremely related to innovation process and particularly to open innovation approach. Therefore, this qualitative research is designed to identify the dimensions and components for providing a conceptual model of KM architecture by open innovation approach based on online communities in the grounded theory frame...

متن کامل

The SQO-OSS Quality Model: Measurement Based Open Source Software Evaluation

Software quality evaluation has always been an important part of software business. The quality evaluation process is usually based on hierarchical quality models that measure various aspects of software quality and deduce a characterization of the product quality being evaluated. The particular nature of open source software has rendered existing models inappropriate for detailed quality evalu...

متن کامل

Behavior-Based Online Anomaly Detection for a Nationwide Short Message Service

As fraudsters understand the time window and act fast, real-time fraud management systems becomes necessary in Telecommunication Industry. In this work, by analyzing traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provides a vast amount of usage data. Thes...

متن کامل

Reusing Open-Source Software and Practices: The Impact of Open-Source on Commercial Vendors

One of the most intriguing ways that commercial developers of software can become more efficient is to reuse not only software but also best practices from the open-source movement. The open-source movement encompasses a wide collection of ideas, knowledge, techniques, and solutions. Commercial software vendors have an opportunity to both learn from the opensource community, as well as leverage...

متن کامل

Harnessing the Power of Self-Organization in an Online Community During Organizational Crisis

The technology platform of FEU’s online forum is an open source software product called Discuz!NT. As of May 26, 2008, the version of Discuz!NT powering FEU’s online forum was Discuz!NT 2.1. This version provided basic online message board features including subforums, user profiles, photo uploading, and file attachments. Compared to today’s online forums, Discuz!NT 2.1 did not have a tag featu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 97-D  شماره 

صفحات  -

تاریخ انتشار 2014